Performance/Energy Optimization of DSP Transforms on the XScale Processor

نویسندگان

  • Paolo D'Alberto
  • Markus Püschel
  • Franz Franchetti
چکیده

The XScale processor family provides user-controllable independent scaling configuration of CPU, bus, and memory frequencies. This feature introduces another handle for the code optimization with respect to energy consumption or runtime performance. We quantify the effect of frequency configurations on both performance and energy for three signal processing transforms: DFT, FIR filters, and WHT. To do this, we use SPIRAL, a program generation system for signal processing transforms. For a given transform to be implemented, SPIRAL searches over different algorithms to find the best match to the given platform w.r.t. the chosen performance metric (usually runtime). In this paper we use SPIRAL to generate different implementations for different frequency configuration, optimized for runtime and energy consumption (physically measured). In doing so we show that first, each transform achieves best performance/energy consumption for a different system configuration; second, the best code depends on architecture configuration, problem size and algorithm; third, the fastest implementation is not always the most energy efficient; fourth, we introduce dynamic (i.e., during execution) reconfiguration in order to further improve performance/energy. Finally, we benchmark SPIRAL generated code against Intel’s vendor library routines. We show competitive results as well as 20% performance improvements or energy reduction for selected transforms and problem sizes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ACT: A Low Power VLIW Cluster Coprocessor for DSP Applications

The ACT (Adaptive Cellular Telephony) coprocessor architecture is described and analyzed using a set of widely used DSP algorithms. Performance and power are compared to equivalent implementations on ASIC and embedded processor platforms. Flexibility is achieved by fine-grain program control of communication and execution resources. Compression techniques, simple addressing modes for large sing...

متن کامل

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...

متن کامل

Design and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)

Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...

متن کامل

Energy on Demand : A Multimedia Gateway Methodology EEC 282 Project Report

One of the most important design issues for battery operated multimedia applications is low energy consumption. This paper proposes an energy optimization technique for latency and quality constrained video applications in a multimedia gateway. We propose different queue creation and scheduling algorithms to dynamically change the energy/distortion factor based on a client’s power consumption r...

متن کامل

For review only – do not distribute OPTIMIZATION AND BENCHMARK OF VISION ALGORITHMS ON A DSP

This paper shows our work on performance optimized implementations of low-level vision algorithms on a Digital Signal Processor (DSP). The platform is a TI TMS320C6414 DSP running with 1 GHz and is therefore a cutting edge DSP. Performance optimization steps for DSP implementations are shown. A final benchmark compares the DSP with Field Programmable Gate Array (FPGA) implementations.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007